AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Zero-shot Video Retrieval

# Zero-shot Video Retrieval

Llave 7B
Apache-2.0
LLaVE-7B is a 7-billion-parameter multimodal embedding model based on LLaVA-OneVision-7B, capable of embedding representations for text, images, multiple images, and videos.
Multimodal Fusion Transformers English
L
zhibinlan
1,389
5
Llave 2B
Apache-2.0
LLaVE-2B is a 2-billion-parameter multimodal embedding model based on Aquila-VL-2B, featuring a 4K token context window and supporting embeddings for text, images, multiple images, and videos.
Text-to-Image Transformers English
L
zhibinlan
20.05k
45
Llave 0.5B
Apache-2.0
LLaVE is a multimodal embedding model based on the LLaVA-OneVision-0.5B model, with a parameter scale of 0.5B, capable of embedding text, images, multiple images, and videos.
Multimodal Fusion Transformers English
L
zhibinlan
2,897
7
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase